Mm-hmm. Mm-hmm. Mm. Mm-hmm. Find replace, yeah. Uh-huh. Oh yeah. How much should you want um I had a few times where I thought well, it could be this, but if you're not entirely sure that it could be, 'cause it c it could be a combination of something else, like put the guess down or leave it as ambiguous in some way. Right. There's a chance they could be. Yeah. Mm-hmm. Mm, that's what I've been doing so far. It took li it like the third time 'til I thought oh, you're saying F_ five. It just came out of nowhere. But if there's a chance that it's equally likely that they could have said something or they could have said something like ice cream and I scream, then just choose one and leave it, right. Mm-hmm. We just started our speech recognition assignment just now. You're an S_L_P_ person. You survived S_P_ one and S_P_ two. We have two new teachers for S_P_ two. And uh I'll ask you later on like if you still have your notes. Mm the hyph And you only use the hyphen I think when the sentence is definitely not continuing anywhere. As in like it's Mm-hmm. Right. It was the creaky voice. Creaky voice. But it is c really connected to the word. Can we hear it again? From the from the And sometimes with the I would be thinking, are you anticipating a or you're you t trying to say the and you're just saying the the the the the. Mm-hmm. Yeah. Yeah, yeah. I try to take consistency, if they always stress the wrong part of the word, then I think okay, I'll let that in, but if if it comes out and like one word is Ah. Well, at the moment we have a lecturer that can I can never quite s tell if he's saying problem or programme. Programme problem. And it I can't actually say but when I'm listening to, I'm thinking And I can't tell. Hope he's not involved with these actually. I hope he's not involved with this. That'll give some transcriber a headache. It's easy to, 'cause you're so used to t t tuning them out, and you know it's easy when I go back over I normally g like go back to the beginning, do a quick pass of the main sections again and think ah yeah. Mm-hmm. That's what I thought the first day as well. Mm-hmm. Right. Mm-hmm. Yeah. So if they pause, just where you've got it highlighted at the moment That wor the hyphen. So I have to re-do the meetings from the minutes from the last meeting. Was she was that a like w r real definite I'm starting a new sentence here, 'cause I'd sometime if it sounds like he could carry on saying the whole sentence, I sometimes don't have a hyphen there. But is it a t any hesitation? Okay. Okay. Right. Yeah. Yeah. Just one to the next door, yeah. Oh goodness, who would know what that was uh if somebody that didn't know about linguistics were trying to transcribe S_ node. Yeah. Yeah. Yeah. Sorry? Yes, yeah. You're three lines in. What's the What's the C_? Is that a There's a d judgment is that that Oh, I've not used that button yet. In the dictionary here it's one word. Yeah. Would give Emacs a headache as well. Yeah. Yeah, the C_. Um She was cutting off rather than saying she would like s to have see something. Is a whole word. Yeah. 'Kay. Thanks, that's what I was asking. I should've just said it that way. Yeah. Uh-huh. Should it? 'Cause the loose uh is continuing from one sentence, isn't it? Was there? Okay. Oh yeah. Sometimes I've made buts after la like words after laughter, I've made them capitalised, 'cause I've not been sure if there's an if the laughter's been cutting everything off. Mm-hmm. During. Was that circuit? I figure if she is a person that calls circuit circuit I'm not sure if she did. Just um if it's her, I just caught the end of it and No, not for that one. But I had one person who said circuit. The. Mm-hmm. Yeah, something like if ha I did that one 'cause it is well, a language teacher wouldn't low allow that to mo move on to the next stage. That wouldn't they wouldn't let that pass. Sort of. Or I wouldn't. If somebody that, I'd point it out to them. Yeah, 'cause otherwise you could end up putting a hyphen after every hesitation mark that you have. Just if Just a prosodic marker rather than an actual hesitation. Uh as in a whole s like back-channel. Oh yeah, sorry. I th I just read the rest of the sentence. Yeah. No. 'Cause it's not linguistic. Well, not yeah. I know, I yeah, you could think of an argu Or should we start it if we start the conversation in the common room, like is laughter linguistic, then they could come up with a theory. But Oh, d Oh, I just noticed the the random yeahs, like with having the hyphen before they're s Uh if you scroll up a little way. Yeah, that ha um the second one from the top. Thinking I'm gonna have to go back and find my places where I've had someone pausing and then saying yeah yeah and then this that I've not had I need to go back and find uh p put hyphens in. Hyphens or commas? Yeah. No, s alright. No, I've I've th I've sort it out. Alright, okay. So we don't have to worry about commas. Okay. Yeah. I I read somebody's P_H_D_ proposals yesterday and s and it's saying, well, you could have a comma in here and one there and Cutest? Thank you. That one step behind. Oh. So which did you dec uh did Evaluation. The z z Evaluala? Yeah. I've had one person with a dodgy headphone with with dodgy something where most of the time it's it's as if they've got their hand over something. Yes, the uh yeah, it w it's for Right. Hmm. Right. Thank you for the warning. Right. Lot of people that say is this on and they hit it a few times. No. How about names that somebody says the name of somebody and you think, who? And you don't know how to spell the name or if you heard the right name. Sorry? Mister Ed. Oh it Mister Ed. Right. Mm-hmm. Okay. What was Mister again? I can't remember seeing how to transcribe Mister. M_ M_R_ Mister Sorry? Missus? Um Right. Is that Yeah. Not come across any yet. Oh. And not accidentally translating in the process. Oh yeah, I saw it there. There it was. Okay, oops. Checking their own transcription. No transcriber will be checking their own transcriptions. Guess it depends how far you can stretch. Oh yeah, 'cause every time somebody'd be going over there you'd hear all this sound of And about five or six breaks in the between is the making it a. looks like some piece of medical equipment. Mm-hmm. Oh yeah. Mm-hmm. Oh. Mm-hmm. I just remembered one. Okay, when it's just as 'kay, but it's at the beginning of a sentence, should the K_ be capitalised? Good. You've been doing this too long. Grease paint. Mm-hmm. Mm-hmm. The arrows by the resolution. Not that one. The other arrow next to it. Yeah. Yeah. Oh. Ooh. Ah. That might have made it s 'cause I I appear to have nothing. But I was hearing stuff, but I g there was nothing on my wave-form. It's kinda hard to find where everything was. Control panel. Right. I think I've gone a bit far in cutting mine down at the moment. Mm-hmm. No, probably not. No. Mm-hmm. Easy for checking at any rate, rather than having to listen to a whole Anything. Mm-hmm. The work allocation page. Right. It do yes. Checking in page. Oh right. Mm-hmm. Uh it's vaguely connected question. Um, you You'll know. Um originally the like when we first like replied, say yes if you're interested in doing this, it said six weeks, and the contracts say eighth of May, which is longer in advance than which is a greater amount of time than six weeks. That's Okay. Right. Okay, that's. Cool. Right. When it gets into the holidays and we're not necessarily around in the holidays and and Right. Just checking. Or else you can like p yes. And if you can plug microphones into Apple Appleton Tower computers, I can do it there as well if you wanted me We could meet up and have a session. That's a nice way to put it. I'm having sort some s I'm having to sort some tax stuff out before I can even send my stuff in, 'cause I need to make sure I'm sending them the right form. But they won't even know what examining board I got my A_ levels in. Yeah, for human resources. I thought, surely you don't need that information, 'cause it took me it was about to take me half an hour well it did when I was like just going through the forms and filling in all the details, I thought I'm not going to spend w A couple of hours getting photo-copies of degree certificates to give you for your files if I'm just gonna be here a good th they won't I think it th at one place it said give us copies of degrees? Oh, maybe I'm getting mixed up with something else, but it wanted a A_ level like examining board, which seemed a bit ridiculous. Right. Mm-hmm. Yeah. Didn't think so. Mm. Alright. Oh, so it is your baby then on the photos. One assumes this. How old now? Oh. Yeah, that's the question I was just gonna ask, does he sleep? Wow. Alright. Oh, do we have to tick our note thing again when we finish this? To say that we've
If not more, yeah. Okay. Okay. So the three of you and let's see m um myself and Maarika, who's not here, and but I think so it it occasionally Melissa will all be doing checking of the transcriptions. So this is the second pass these transcriptions are receiving and that's pretty straight-forward. I think you all kn basically have an idea what's involved. Um but the extra step that we're asking you to do f in this case is to look at the silent regions and listen to them for each channel and um make sure that no back-channels were lost for example, um yeah, back-channels is what you n typically would find, although by and large these are gonna be silent. So So you can um start s checking by y well you you know, you can order it in any way you like, but um start with the first channel and go through it entirely. Um I would recommend, I've found that it's easier when you're checking and you're trying to make sure the boundaries are right for a speech segment, I would recommend, like say this one, go down here to the uh trascription transcription tier and select that it so that uh when you play, you're certain that when it stops, you've caught the whole speech signal for that utterance and haven't lost any signal information. Um otherwise it it's liable to keep on running, and you might not notice where it where it finishes. So it it takes a little time. But um Alternatively, I you you could just listen to the speech segments again and make sure everything is correct for those, and so you could just hit um And then Like that. And then go back and check the the silence regions. How it if if you wanna make your job a little more interesting, you can d do it in m different ways on a different day. It doesn't matter. As long as it gets done. So Yeah, yeah, you're doing so that's that's the role of the checker is to check what's said in the silent regions uh in the speech regions and then make sure nothing was lost in the silent regions. You're also um responsible for resolving any um uncertainties that were marked by the original transcriber. And and you know, of course there are gonna be m many situations where you like the first transcriber can't make any sense of it. And in that case, it will be up to you to use the at symbol. Um and that can be used uh to represent one um unintelligible word or multiple intelligible words, but just use one at symbol. Um yeah, mm-hmm, exactly. And so that um once you go through all the s um speech regions and all the um non-speech regions, an o you could make another step of doing a search using um going under the edit menu you can do a search for, yeah, find replace for open parentheses and that'll take you to any that it any that the first transcriber might have marked out. So there's one. And then you can just resolve those all one by one as as a different set of um changes altogether. And and likewise for the double question marks. Just once you've gone through um and resolved all the open parentheses, you can do the same for op uh double question marks. And um and like I said, if you if you can't make any sense of it, then just use the at symbol. If you can um, just give it, you know, give it your best guess in these cases. If Mm-hmm. Yeah. Yeah, it's I would uh if you're v if you're quite certain, then I would go ahead and um type what you think it was said. If if you v think that n y there's maybe a th thirty percent chance you're wrong or or more Yeah. Right. So maybe a good idea would be to listen to each channel one by one, do listen to the silent and non-silent regions and make all those corrections, leave the unknowns, the um ambiguities, the uncertainties uncertainties until later. So go through all the channels and do that, and then um then you should have the context from all four speakers, and then go back and do those searches one by one for each channel. How would that be? Does that sound good? Okay. You'll you'll know all too well what was said in this meeting by the time you Yeah. Yeah, I think that would be really helpful. That would help us um, you know, nab a good number of those uncertainties. Mm. Choose one, it c I mean y think of it from a speech recognition point of view again. And always this should be this should motivate you. Yeah. What is most robust for a speech recognition and and leave out anything So you're well geared towards this task. Um So I've gone through Sorry. Oh no no no. Go ahead, go ahead. Too many. Okay okay. So um this is one I've gone through. I've just done one channel. And there were lots of lots and lots and lots of things as you can see. 'Cause I've got square brackets around all the things that need to be changed. Um so this person didn't ha um th obviously didn't know about capitalisation. So just about every utterance needs to be capitalised and and needs the end punctuation. Yeah. Um, no no no. Whatever um makes sense to you. Um but no, it it can continue into the next segment and that's perfectly fine. Um Yeah. Yeah. I'll review some of these things um after I go through this a bit um just to re-cap. Th these are things that I've after checking um t everybody's work um, these are things that were commonly left out or, you know misc mis-transcribed, whatever. So I'll go over those t as an opportunity to use the white-board. Um um okay. So uh this person left out a lot of end punctuation and um here's a case where I think she was transcribing an out-breath um or possibly an in-breath uh t uh in and used the hash mark. In any case, I didn't hear anything and and you shouldn't be transcribing in-breaths or out-breaths. That's n information we don't need. Oh, you know, this is the head-set mic and I heard something there. It was but it wa yeah, it was a bit of a yeah, it was creaky, so yeah f uh Yeah, well I didn't think it was im let's see, wait. Yeah. S there's there's actually quite a bit more in there than than I heard before. This is so I w I heard a creaky voice, and it was audible of the level of the rest of the speech. So I would I would keep that. I think that's good. Oh, okay. I was adding that, that's what I was doing. That makes sense. Okay, there you go. Um so there's things like that. Um sh there's a missing false start here um that didn't get caught. So I added that in. Um so that that's something that's quite common in the transcriptions that I've looked at. Missed false starts, um missed m Try to get it right. I mean the it I think syllables should be accounted for for th Yeah, well don't let it drive you crazy, 'cause there is one speaker Melissa just played for me today and he stutters horribly. I mean I I I'd you know, it's unfortunate for him and for us 'cause it's really gonna be a nightmare to transcribe. So you will have to draw the line somewhere. But if you c if it's say four repeated um if the word is repeated four times, uh then that's not too hard to count. I haven't come across something that's been I haven't come across anything exceeding four, I don't think. Have you? Mm-hmm. Mm-hmm. Uh-huh. Yeah, yeah. Yeah. Yeah. Make your best guess. Try to listen to it carefully enough, but not not so that you're obsessing over it for too long. I would say. Alright. Well there again, f uh think about robustness and speech uh recognition and it really doesn't matter. If it's a word fragment, who cares. It you're marking it as a fragment, and that's what's important. Well um I wouldn't de be using the star so liberally um for a non-native speaker. Mm-hmm. Well, right, right. Um I mean and and for that matter a lot of n native English speakers' realisations aren't going to be so true to um Yeah, yeah. Yeah. Well, I wouldn't bother to mark all of those instances though. No, I mean yeah, they're they're uh you're gonna find that for any speaker, they have a set way of of realising these things. And and it's just like Fiona was saying, um that the the word that stands out is something really uh aberrant. Like um I'm thinking of I I've used this example before, there's a guy who says project instead of product and uh it's something like project and it's probably just his his pronunciation of of. Yeah. Well, it it they were clearly talking about a product. So so uh I m I marked that as a m uh mispronunciation or s or something to flag anyway. Oh, evaluation. Yeah, I I've I remember that. That one I would I would flag, just like you did. It that's something definitely to mark up. Mm yeah. That's good. That's good. Alright. Yeah, I kn I don't know what to s tell you about n um certain speakers. I it's just something we have to deal with. Yeah. Yeah. Okay. So oop, I lost my lapel lapel mic. Yeah. Wait. Oops. Least I won't be transcribing the What do I care? Okay, so um there've there've been lots of missed filled pauses. Everybody seems to miss those, some more than others. So just really keep an ear out for those. Um Mm-hmm. Yeah. Yeah. I I'm not sure, I miss them plenty myself. But um this this second pass we should really try to catch all of them or at least most of th yeah. No, no. After. Okay. Yeah, it should be after every um acronym letter. Yeah, yeah. So at this point the um the only file that's been checked is Melissa one that Melissa did, and that was checked by me. Um so w we we're starting fresh with these and, you know, changes like that that were just announced we'll we'll be able to catch for these first passes. So yeah, y if you do see um something like um U_H_M_ for um, you you would change that to the regularised v version. Um and we just ch you know, this is arbitrary, but we had to choose something and not let it get too convoluted with um with choice. Here's Uh-huh. Mm-hmm, there. Um I Yeah, yeah. So it's not necessarily restarting the sentence, but just a restart of any type. A phrase. Yeah. Thank you for that description, Beata. Here's a uh so that answers your question about that answers your question about this one up here? Okay. Here's where I've I heard a another sound, another false start. Um or a stutter, I would say. Let me see. Oh yeah. I'm not seeing where this is. Okay. Okay, okay. Okay. So, that kind of thing. And there was a vowel in there, but you noticed that she just used the and that's fine. The the letter D_. That's she would like see. She Oh, sorry. I didn't go f back far enough. Yes. Um where is that. Yes, she's so she spelled it wrong. Um uh up here. Yeah, so I d I mentioned that because of spelling. And yes, any g I It's one word. So I d I just said spelling. It's an admonishment. Sorry? Uh like A_T_ is uh Mm-hmm. Yeah, yeah, yeah. If you have any questions about spellings, I i I think I m might have sent an email to this effect, just check the Oxford English Dictionary online. Uh and that's that will be our guide. Um, so a as far as I know, this is just missing an E_ cetera. Um but anyway. You can We've considered experimenting with running this through a spell check. You could take this as an Emacs file and run a spell check on it. But I think it would slow things down too much. So just hopefully i in going through this very carefully you're going to notice these misspellings. Mm-hmm. Mm-hmm. Yeah, yeah. Right. Yeah, I th I think you're right. So we'll we will just hope that you catch it. Um That's what we'll do. Okay. So Sorry. So I think she just she didn't catch the full form. Right, right. Yeah. She I think she meant to use the infinitive, but she just left off the to. A space between She would like see. She would like see. Yes yes yes, you're right. Yeah. Exactly. Uh that was what you were m remarking about. Okay. There we go. Let me save that. Okay. What else? Um be careful about um finding mis-transcriptions. So um So she t she transcribed that as would. And so Well, I think it's quite clear there. I mean obviously there are gonna be ca No, you try to keep it as true to what's being said. Then use um Mm-hmm. I'd have to hear it, because it m sounds like it could be a fragment. Uh it ju I'd just have to hear it. Um M I'd you don't want to b um be prescriptive and and type what you think they should've said. But but I I I do know the type of scenario you're describing. I just it's just hard to answer that without hearing something. Mm. Yeah, they should all I I stopped marking them, 'cause there are just too many. But Uh let's see. Oh yeah, y you're right. Right. That's right. Okay. There was a yes in there. Um and Yes. So another missed false start. Um There was creakiness on the she. But you don't need to mark that as a separate sound. Um Mm. If it sounds like um the laughter is just um interrupting a an utterance that continues after the laughter, I wouldn't worry about capitalising. If you hear just laughter on either side, yeah, do that. Yeah, yeah, mm I don't know, we'll have to I'll Okay, so laughter during speech. I'll f I'll talk to Melissa and Jean. S Mm-hmm. Mm-hmm. Mm-hmm. No, we don't have any qual tags like they use. But um I'll f I'll I'll figure that one out and put it up on the wiki or email you. Good question. I've come across that as well. Mm, see. Oh yeah, I don't know what that that noise was indicating. Um we can skip on. There are other types of things I wanna point out. Um let's see. Okay. Did you Did she say circuit? I didn't really hear that. So would would you have been tempted to put an asterisk after that? Uh yeah. Oh, okay. No, not necessarily. No, it's just something that the speech recognizer is liable to trip up on, I think. Mm-hmm, mm-hmm. Um y that may be, but I probably would flag it. I depending on how how different the vowel sound was. It's gonna be your call in a lot of these cases. Mm. Okay, here Here's a case of um where I found it n uh found a case of discontinuity. But you know, because she's a non-native English speaker, I don't I don't know. Maybe that was grammatically n uh unmarked for her. If you know what I mean. Maybe that wasn't uh her re-starting, maybe sh that was Mm mm-hmm, mm-hmm. So again I don't know. That w that was my reflex at the time to put that there. But Yeah. Well Unless you kinda get Right. Uh yeah, yeah. Yeah, I would type it as two. Mm okay, I'm gonna skip ahead. Oh, let's see, there was a c So I I hear she would quite clearly there and I think a good number of them you'll be able to resolve just just helps to have the context and be able to listen to it again. Well. Okay. Yes, it could be a lot of things. Um So there is there is another case where there wasn't a space between uh a non-word fragment and um a re-start. Okay. What else. There is an extra little symbol in here, and that was just kinda carelessness. Um So d even if it's something as brief as mm-hmm, we're we're gonna go ahead and capitalise that. And Maarika asked me what to do in the case of an utterance just consisting of laughter, do we do we use punctuation in these cases, and I don't know. I haven't been. I don't think it's necessary. Yeah. Uh no. Um Delete and collapse. I'm trying to underst I'm trying to interpret my note here. Delete and collapse the surrounding segments. Okay, I guess there there just really wasn't enough to transcribe there. Yeah. Yeah. So like I said, click down here just to if you haven't learned that trick already. And that won't play any surrounding segments. So there's nothing there worth transcribing. Um Uh Uh-huh. This one? Oh so you haven't put a hyphen there. I see, I see. Yeah. Okay. Commas. Commas did you say? Oh For g for that s type of situation? Yeah, okay. Commas are fine. I mean they're eventually they're going to be extracted globally, so it doesn't matter how you use the comma. Yeah, we can just wipe them out in one fell swoop. I I am too. I I like commas and then uh Steve Renals, who's project manager, had a look at m I think it was my transcriptions. What are all these commas doing in here? But he didn't he ultimately he said it wasn't a problem. They will I th they'll help in the case of discourse marking, but his point was that they're really subjective, I mean something that you the transcriber here Yeah, yeah. I really like using commas. Oh. Sh oh, she spelled cutest um with an I_, so that that's just something I pointed out. Okay, what else? Um So here's let's see. Now this was our So she typed uh our and I I didn't quite hear that. But But I you can see how it could've been, I don't know. Yeah, I don't know. Right. And you can never know, but if it fits well enough as a as an indefinite article, then you might as well. Oh, I was s I was suggesting that sh it be uh instead of our. But Yeah, I it's tricky, it gets ambiguous. Um I don't know. Uh what's this? Well oh, she's I hear the well there. The but this being a head-set mic channel m may make it easier to hear. She transcribed from the lapel mic. Uh Oh. Oh. Oh God. And you transcribed that speaker entirely? If that comes up again, we've got um headset mic files now. In fact, they replaced all the lapel mic files on the M_M_M_ server with head-set mic files, which we weren't asking them to do, but anyway they did it. And so now we don't have the lapel mic files. Apparently the headset mic files are are just a bit louder. And so you'll f probably find yourself turning the volume down considerably. But but apparently other than that they're just as good. But let us know, because we can retrieve the the other files. Yeah. Oh. Okay. I don't think I'm gonna go through uh any more of these, because I I think I've gotten through most of the types that I meant to point out. Uh right, right. Um what are we going to do for that. I d I did see s I've been looking at some guidelines for another corpus and I did see something about proper names, so I'll I'll have a look at that and see how we can handle that here. What Uh yeah, quite possibly, but since it's coming from ICSI uh it's coming from IDIAP in Switzerland, yeah, it w it should be easy enough to get, but it's it's one other step, you know, emailing somebody to find out who knows this information. Anyway, it's Yeah, yeah, because everyone signs a consent form when they participate in the experiment. So we ought to be able to get that. But but there shou we should also have a w a means of dealing with proper names when we don't know what it is or how it might be spelled at all and and flagging that in in some way. So I'll th I'll I'll try to address both things. Oh, just just type Mister fully. I M_I_S_T_E_R_, yeah. And same with missus, M_I_S_S_U_S_, I believe. Is it I believe I believe check check the dictionary. That yeah, but you would never um type the abbreviated spelling for anything. Um well, I'm sure there's a spelling in the O_E_D_, yeah. Uh-huh. Oh right. So the transcriber's enclosing everything in carets, and what you would do is That's a good question, because you will be leaving question marks there. Um okay. Right. Um it shouldn't, because things are going to be left like the asterisk will be left for the proc the subsequent processing. But carets with question marks in-between would have to be resolved in some way. Yeah. Yeah. Yeah. I I suppose that's all we can do really. You're not supposed to be polyglot transcribers here. Oh yeah yeah. If you if if f it's a foreign word and you know what it is, just enclose it in carets. Yeah. Yeah, yeah. If you if you're confident that you've got a fair grasp on French for example, why not. Yeah. Okay. Um there is there is just one other thing, she she typed okay um in caps. O_ c c O_ capital O_ capital K_ and huge no-no, I don't know why she missed that in the in the guidelines. Yeah. It is O_K_A_Y_. Yeah, yeah. Okay. Okay. Well, it will be your jobs to catch all of these things. So no um n no transcriber will be checking no checker will be checking their t own transcription. Yes. Yes. So we will catch these things more efficiently, I think. And on so I'll just go up to the board do I have to keep this stuff on? Oh yeah, they've got clips. Right, I'm mobile. Alright. Okay. So I guess just for the sake of using the white-board and killing a little more time I'll just write down the things that are most common to be checked. So let's see. So oh, okay. Now. Okay, s alright. Rather this is a list of things to do as the checker. So you wanna listen to the silent regions. Then check the speech regions. And as part of that you want to um check the boundaries. And oh, by the way, this should be um Um transcribe any missed speech. And just let's see what else. Okay. And within this checking the speech segments, you want to listen for or look for missed um Look for missed punctuation, filled pauses, words in stuttering. And if you have um if you have a stuttered word, you don't need to use the hyphen to mark discontinuity. It's just so if somebody says the the, then it's just the the. Um words in st uh And what else. Try to get f false starts. Short words, make sure with little words like the and uh, little articles and that kind of thing don't get omitted. Um short words and word fragments. Those are the commonly missed things. And then um look for cases of mis-transcription Seems so silly to be writing this all. But anyway, Mis-transcription, so make sure if the speaker said will, you type will not would or um if they say we, even though it might be more grammatically correct to type we're, um that kind of thing. Um tha cases of ellipsis, where um maybe the transcriber typed them, but really what was said was 'em, so where you use the apostrophe. And look for capitalisation I'm not gonna gonna do any more of this. Capitalisation. Spelling. Um, use the O_E_D_ to check on things you don't know about. Um refer to the wiki for all regularised spellings and f for things like filled pauses. Oh, was that W And then, yeah, capitalise the K_. Yeah. And I think smells really good. I used to always love smelling markers. What else did I find. Um the so make sure the da the dash, the hyphen, whatever you call it, is being used appropriately in cases of discontinuity, um but not in stuttering, like I said. And and don't use it when the utterance is clearly continued in the next segment or whatever. Um don't transcribe in-breaths or out-breaths. And okay should never be um two capital letters, it should be O_K_A_Y_. Um and one other thing is to break break up long segments if you find any. So Yeah, but that's a kind of crude guideline. I'd I'd say just anything that seems unmanageable. Ah. Yeah, yeah. Really uh it it means nothing to say a minute, it's just a g a really crude guideline. I would just try to keep segments very manageable um and a but u you know, use these wave-forms to help you. And you can adjust the resolution so that Oh. That didn't help me too much, but you can use this little slide bar to adjust the resolution. Actually that doesn't seem to be working. Which o Yeah. Right, okay, okay. So yeah. Mm-hmm. Oh okay, right. Uh-huh. Hmm? Ah right, that's good. Okay. That's excellent. Alright. Oh that's that's good to know. Okay. Yeah, so to help you make cuts, just look look f you know, use this wave-form and make sure you're not l losing any signal information. But um cutting segments down is a l a low priority, but it it will help ultimately, I think, post-processing wise. I don't Yeah, you don't don't get excessive about it. 'Cause that just ends up taking too much time. But I d I think from a speech recognition point of view it shouldn't matter, these segments being whatever length that they happen to be. I've seen other p people's transcription guidelines like f for the um ICSI project at Berkeley and elsewhere say indicating that it really doesn't matter and it's the point is just to get 'em into manageable units. And Yeah. Yeah, yeah. Yeah, that's true actually, that's the best reason for cutting them down, but hopefully the that'll be done during the first pass, because that's the whole point. Making the checkers' job a little easier. Um and then, yeah, just m um be sure after you've gone through the four channels to um do a search on all the unknowns, all the um cases of ambiguity or whatever. And what else was I gonna say? The way we'll be allocating this is is the same as before, th it's gonna be on the th the same the same page, just further down. Scroll down, you'll see w all the files. And uh I've already allocated some some things to be checked. So if you run out of work, go ahead and start checking, because this is something that's um considerably um bottle-necked at the moment. Yeah, I'd say for now um w we don't even have uh much new stuff to be uh to be gone over for a first pass. So it's important that we d um do this second pass. And I've indicated in a column whether something is priority. M I think I high typed high if it's high priority. Um so d those would be the ones you would focus on first, obviously. And when it come you don't want to change the file-name in any way. Just keep it the same. Um there there is going to be a checking in procedure. I don't know if anybody's mentioned this before, but you'll you'll know about this with the C_V_S_ server. So th Right, but it's be s y similar. But they're they're creating a their a guy Jonathan is creating a web form for you to do this all very simply and like just um upload everything that you've uh up upload the file that you've transcribed and and you'll be noting whether it's the first pass or the second pass and y your name um as the person who did that particular pass. And that will happen soon. I'll let you know. Yeah. Yeah, you'll have to I would suggest creating a f folder um called files to be checked or something like that. Yeah, and then within that create a s create sub-folders for the the given meeting and you'll have to get all the sound files unfortunately from the M_M_M_ server on your own, because the the AMI server's not very reliable for getting these sound files uncorrupted. So you'll have to get all the sound files from the M_M_M_ server and the person's transcription file should be in that folder where you're putting yours. Um and then you'll just uh yeah d you'll copy that over, you won't don't move it over, because we wanna have multiple versions of these things, but yeah, copy it into your directory and keep the same name. And then when you check it in, it will automatically be registered even though the the first pass was checked in when we're doing this checking in business, um it will know which pass it is, but I think you'll also be indicating that as well. Uh-huh. Mm-hmm. Yeah, it's it' actually six what is it? Is it three months, I think? Yeah. Yeah. They're they're three-month contracts and now, if you can't keep that, that's fine. But it's just a logistic thing with personnel where where we ha that was the minimum we were able to offer a contract for. So if you have to bow out, let us know as soon as you know when you you're gonna have to stop. Yeah, yeah, obviously those kinds of things are things we'll have to work around, but um yeah, it's just one of those details. Mm-hmm. I should think so. I mean the whole point w of keeping well, as long as you're able to, yeah, work out use of your computer with the other person, that's fine, yeah. The the whole point of having m regular hours was to make sure that somebody was here to answer your questions. But you've got keys, so so you should be able to get in during odd hours if if you wanted to work. Yeah, yeah. Yeah, I don't see why not, as long as you're, you know, doing your given set of hours for the week. Yeah. And um yeah. Paym I I think you'll all be paid. I think you'll all be paid uh well, have you not Mm. Okay. Okay. Really? No, that's certainly not necessary. Mm no. Well it If you have any th problems like that, mm um see Caroline Hastings. I mean if it's for this job and and getting paid, that that's not necessary and Caroline Hastings ought to be able to help you sort that out. But j also if if pay is a matter um like if it's a matter that you haven't gotten paid for some reason, um I hope that's not not the case. Mm-hmm. Talk to Caroline. Yeah, 'cause I've I really don't know um why that would be. I'm here full-time now, so I'll I'll be here m Monday through Friday and I l I leave at four now, so you won't find me anytime after that. But Yeah, my husband's at home with the baby. Yeah, yeah. He's uh four and half months. It's very lovely. A nice jolly little baby. Yeah, well um only as of Friday actually. So he was uh I was just telling Melissa this do you want us to stop? Yeah, yeah. Yeah yeah. Oh good, that worked out well.
More. It's usually a context problem, because I think sometimes if it's not intelligible you either tend to because you really want it to fit the context, um you tend to take the closest actual word that would fit the context, or if it sometimes it sounds very clear as a word itself, but then wouldn't make sense at all, and then you m question, you know, second-guess whether it's really what you heard. Mm. Hmm. Be robust. That's always fun. I've done that too. Oh, never I I don't wanna interrupt it. Uh uh when I did my masters um I took uh S_P_ one and S_P_ two with Simon King. Yes. And actually I've done quite well in S_P_ one, I've done it a bit worse in S_P_ two because it was a l a lot more challenging. Okay. Yeah. Right. Okay. Back to the topic. Yeah. I think that's actually the only case where you don't or where you're not supposed to capitalise, right? So if it's quite clear that the sentence is being carried on, the p you know, the sentence from the previous segment is being carried on, then that's when you don't capitalise. Otherwise you do. Yes. Yeah. Yeah. Yeah. And you sometimes you don't know really whether it's a mic noise or that person is making that noise. Yeah. Okay. And how exact do we have to be there? Because um you know, some people tend to stutter more than others and then they would have a false start four or five times in a row. And how important is it to listen to it really until you've got the exact number of a things right? Yeah. Yeah. Well, 'cause sometimes they go you know, like that, and then you're not really sure whether it's three times or four yeah. Yeah. Yeah. Mm-hmm. Yeah. Well, it's not that the word is if the word were the whole word were being used four times, that wouldn't have been that difficult, but sometimes it's just a false start, which is really just maybe one or one and a half phones and then repeated and then you don't really know whether one is a long duration of one or whether it's actually two, and that kind of thing, so it's I think it was a word starting with D_ and the person was going or s you know. Yeah. Hmm. Yeah. Well, speaking about Speaking about robustness, if if somebody has an accent that you can only really understand what they're saying due to world knowledge yeah, it what they say, it doesn't really actually sound at all, you know, what it is, then do you transcribe more phonetically or do you put uh uh a star of after every single one word? Or Yeah, that's what I thought, that you're not supposed to do it, but um the one there's one I'm transcribing right now, you know, a whole batch of dialogues where uh all four participants are non-native, and there's especially one there that um I think he's Eastern European and I think that's the only reason I can understand what he's saying at all, because I'm Eastern European and uh but if we would just play it I think to a recognizer, then what he says, you know, the way he realises words, sounds nothing at all like it it should be. Yeah. Yeah. Yeah. No, it's less of a s it's less of a stress. It's really like uh his uh, I don't know, just the phonemes sounding differently. Well, you can't really, because it would be all of it. Oh. Yeah. Oh, I think I've transcribed him and I had project at all um evolution criteria. Yeah. Mm. Okay. Yeah. And one guy who was saying evolution each time he meant evaluation, and it took me a whole dialogue Yeah, yeah. Yeah, but that's really something I would always transcribe it, you know, for lack of better knowledge or lack of context, and then at the end of the dialogue it dawned on me that he actually meant evaluation, and then I went back and replaced I replaced it in the whole transcription. Yeah. Yeah. Yeah. Yeah, basically th I think just do your best and hope for the best. Well, we have both head-set and lapel transcription. So it's not too bad. Yeah, let's have Maarika transcribe that. Mm-hmm. Um sorry, I'm just seeing this with the T_V_, there is T_ underscore V_ underscore. I thought we're just supposed to use the underscore between letters? No? Okay, sorry, then I've done it wrong so far. Okay. But it is a back-track. Yeah, I think of it in terms of, you know, not even maybe to well, if I th if you think of it in terms of a syntactic tree, it n doesn't have to go back all the way to the S_ node, but it just m it might go back to the last node or something like that. Yeah. Well it's just my way of of knowing if it w sorry, 'cause linguistics is all I've ever known. Mm-hmm. Yeah. Where? Oh, et cetera.. Is it one word or is it two words? Okay. Because I thought it would be like et alii. But in sh it isn't it in Latin like et cetera, like et alii or No no no, like et w you know, when you have uh when you have a citation and you say et al, it means and others, et et al et alii is the full word, and et cetera means and following. So Yeah. Oh in it is, okay, yeah. Mm-hmm. Mm-hmm, yeah. Mm-hmm. Yeah. I think Yeah, I think i if somebody could write I don't know how difficult it would be in practice to write a script that would ignore um all the word fragments, anything with a hyphen. Then a spell-check might make make sense. But uh to just use simple Emacs, it will uh basically mark um all the word fragments for you, and uh, you know, and I know from experience, at some point you also get negligent, you just click okay okay okay, you know, that's not a mistake, and then you might miss a real error. So it doesn't make any sense. Mm-hmm. Yeah. Yeah. But see is a actual full word, shouldn't it there b shouldn't there be a space between see and the hyphen? You know, not to be not to be pedantic. After the no, the the C_ and the hyphen, shouldn't there be a space? Okay. Yeah. Um the would and w No. But sometimes it's really difficult, sometimes you really don't know whether they're saying will or would. Yeah, he it it's his yeah. But if if it's not clear, do I just pick uh do you just pick the more grammatic alternative? Or B yeah. But if it but if it's really not absolutely not clear, when they say instea you know, some people when they swallow up the the last bit. Or do we just then transcribe? Yeah. Yeah. Yeah. Mm-hmm. Yeah. Yeah. Mm-hmm. The um should be capitalised. Yeah. Okay. Oh, okay. No. Yeah. Yeah. Um okay, so what about if a whole utterance is uttered while laughing? Do you put the laughter in front? Or in what I usu when that happens what I do is I put laughter in front, then the whole sentence and laughter in, you know, at the end as well. So Yeah. No, but it's it's laughter th throughout. Yeah. There's some people laughing themselves silly at absolutely nothing while while they speak. Mm-hmm. Because in the ICSI dialogues um they always say, you know, while laughing, and we don't do that. So They say either so there it's really not ambiguous whether it's just laugh followed by an a normally spoken utterance or whether it's an utterance spoken while laughing. I think there is a distinction there and we don't have that. Yeah. 'Kay. Morse code. Morse. Yeah. But then I w but then I wouldn't put an asterisk on it. I thought asterisk is something where um it's c it's pronounced as a totally different word, like evaluation and evolution. No? Okay. Oh. Yeah, cause I this is a very frequent mistake that no native speakers make, to say circuit instead of circuit. Mm. Mm. Is push buttons two words? I used one word. Okay. Well it could be she would or should as well. Yeah. Mm. I haven't either. It's just too silly. I'm just thinking of uh you know an utterance consisting entirely of laughter and then putting like punctuation in, like. Sorry. Yeah, because it was it was really just silence and you're asking the person to just get rid of. Yeah. Mm. Yeah. Oh. Okay. Yeah. Yeah. Just to make sense of a sentence. What does cutest spelling mean? Oh, okay. Okay. Oh. Well, if you were Cockney, then Sorry. A_ or Yeah. Yeah. Yeah. Mm-hmm. Yeah, with some people it actually leads to some really screeching noises, because they're so loud or probably the mic is so close. I had this one um in one of the first dialogues I did, the g this guy called Ed, you could uh always see the waveform actually, you know, being just hitting um the b the boundaries on on both ends and just being at some points just black segments. And when he was talking there was this whole, like you know when you speak to close to a microphone, and that was really horrible. The rest. Hmm. Mister Ed. Oh, I had one guy, one of the project managers was um he had a really weird accent and he was calling um one guy Mister Ed. His name was Ed, and it took me a whole time to, you find out that it was actually Mister Ed. Or Yeah. No, but that it wasn't one word, that it wasn't like some proper name starting with M_. Yeah. Well, but don't you guys have um have this data? Know the people na people's names, like when you Collect the data, th aren't people filling out a form or something? Okay. Yeah. No, but it's information that we theoretically have. Mm. We didn't sign one. Mm-hmm. Mister just to Yeah. No, M_ Mi U_S_ Mi Missus, the Missus. What if somebody calls themselves Ms? I mean it's not not that it ever happened, but yeah. Yeah. Yeah. Yeah. Oh, so even if we know what that word means we're not supposed to right. Yeah. Yeah, 'cause I have there is one dialogue where I think two speakers are French, like one is definitely French, and I know she is saying and then somebody s saying something to her and she goes comment and then you know you know that, so yeah.. Oh, not really, but on comment, yeah. Ok No. Okay. No. Their own. Probably. Yeah, I can hear the chairs moving and the Yeah. Oh, the white-board marker has a camera as well. M_. No well apostrophe and then K_ A_ Y_. That marker smells really bad. It probably became addicted to the fumes. Over a minute. Mm-hmm. Okay. So if something is a minute and three seconds I can still have it, right? Mm-hmm. Um if you just click on one of the arrows. Um in in the resolution thingy, um it will Yeah, on this one. No, not not the other one, this one. Yeah. Yeah. Oh, and you can also if you go to signal and Um that's it's something under signal, where you can adjust the amplitude as well, on resolution. Um Then you can also wait. No, I think that's not if you go to control panel. Signal control panel. No no, signal, control pan Um the vertical zoom, you can adjust that as well, so if it seems too flat for you or you can't really Yeah. Yeah. Yeah. Mm. Mm-hmm. Okay. Yeah. Mm. Mm-hmm. Mm-hmm. Mm yeah. Well, I don't know about I know about this for the summarisation, I haven't done this with uh transcription yet. Yeah. Mm-hmm. Yeah. Um I've got a three month contract, yeah. Mm-hmm. Yeah. Mm. Excuse me. Yeah, I haven't gotten paid for the first two weeks, 'cause um I w I only have a three month contract starting February, but I started mid-January. Yeah. Okay. Mm-hmm. Okay. Alright. Is your husband now a stay-at-home-dad? He's sleeping through. Yeah.
Mm. Okay. Mm-hmm. In between. Yeah. So we so are we checking what's actually been transcribed already as well as the silences? Yeah. Mm-hmm. Oh okay.. Oh, alright 'kay. Mm yeah. Yeah. Yeah, it does tend to help. Yeah. Once you've been through the passage, you've worked out what they're talking about, you know. Yeah. Mm-hmm. You know, when you get like um someone's talking and there's they sort of pause in the middle of a sentence that's long enough for it to put a break in, but they're actually sort of carrying on the sentence, do you have to capitalise each time you transcribe a bit if it's mid No no no no. Yeah. Okay. Yeah. Just okay. So it's put the hyphen and then Right. Okay. Oh right, okay. I've done a few of those wrong then when someone comes to check it. Okay. Okay. Yeah. Yeah. Okay. Yeah. Yeah. Yeah. Oh. Mm. Yeah. Yeah. Mm. You know, um you standardised all the the filled pauses, the uh and do we need to go through and change ones that people did differently. Like if I where I've written E_R_ for uh, change it all to U_ H_, that sort of thing? Mm-hmm. Yeah. Okay. Yeah, just change it to um. Okay. Yeah. Yeah. We do the meeting. The minutes, yeah. Mm-hmm. Mm-hmm. Okay. Where you've written at et cetera, is that how we're supposed to be writing it? Okay. Yeah. Et cetera. Okay. Mm.. Okay. Okay. That one was clear. Okay. Yeah. How do you know it's that? Cir circuit. Oops. Yeah. Yeah. Yeah. Okay. Right. Someone's gonna have fun transcribing that now. Yeah. Mm-hmm. Okay. Actually, um a couple of times where they've repeated a word, I've put a comma between them. Should that be no comma? Oh, r okay. Right. Oh right, okay. Okay. 'Cause I get I get quite prolific with my commas at times. So um Okay. Okay, cool. Yeah. Yeah. Oh yeah. E_ S_ T_. Oh okay. Yeah. Sometimes you get like an uh and it g you're not sure if it's an, an A_, yeah, or an m it Yeah. As an. Yeah, do it as Okay. Several vowels in there. Yeah. That's harsh. Oh okay. Right. Mm. Mister Ed? Yeah. Or myst mystery Ed or something. Yeah. Mm-hmm. M_ I_ S_ T_ E_ R_. Okay, Missus. Missus. Yeah. M_ I_ Z_? Yeah. So um about um foreign words, what do we when we're checking um if we come up something so we just leave it as Yeah. I got one, which I it was French, so I knew how to ex spell it. Yeah. I mean if it's in carets and they know it's foreign, does that matter? Yeah, and the at symbol. Or could we just change it to an at between the carets? 'Cause it's only intelligible theoretically. Yeah. Yeah. Mm-hmm. Okay. Yeah. Are we doing okay just as O_ K_, not as a O_K_A_Y_? A_ Y_, alright, oops. Someone's gonna find some of those. Yeah. Checking their own work. Right. Have to remember that one now. D Yeah. Wrestling to the white-board. Wow. Okay. W Okay. Yeah, b Oh. Is that zoom in? Wow. I turned the page. Okay So is this the priority then? Transcri yeah. Mm-hmm. Turn the page. Did you get the little buzz buzz? So do we just open the file as we open the files for the first pass, like open up the sound file and then it sh In your home directory. Okay. Yep. Okay. Then you'll lose it, yeah. Okay. Sh Yeah, someone said three months. Yeah. Oh, okay. Mm. Okay. Once we actually get into the like um Easter holidays, 'cause we're at having lectures at the moment, can we be more flexible with our hours if we wanna sort of I mean depending on who's using the computers. Yeah. Okay. Yeah, sure. Yes. Yep. If want Okay. Know what you're doing. Phew. Oh. Yeah.. Yeah. Yeah. Oh. Wow. Okay. Oh yeah. Absolutely.
